voting behavior
Persona-driven Simulation of Voting Behavior in the European Parliament with Large Language Models
Kreutner, Maximilian, Lutz, Marlene, Strohmaier, Markus
Large Language Models (LLMs) display remarkable capabilities to understand or even produce political discourse, but have been found to consistently display a progressive left-leaning bias. At the same time, so-called persona or identity prompts have been shown to produce LLM behavior that aligns with socioeconomic groups that the base model is not aligned with. In this work, we analyze whether zero-shot persona prompting with limited information can accurately predict individual voting decisions and, by aggregation, accurately predict positions of European groups on a diverse set of policies. We evaluate if predictions are stable towards counterfactual arguments, different persona prompts and generation methods. Finally, we find that we can simulate voting behavior of Members of the European Parliament reasonably well with a weighted F1 score of approximately 0.793. Our persona dataset of politicians in the 2024 European Parliament and our code are available at https://github.com/dess-mannheim/european_parliament_simulation.
Framework of Voting Prediction of Parliament Members
Mizrahi, Zahi, Berkovitz, Shai, Talmon, Nimrod, Fire, Michael
Keeping track of how lawmakers vote is essential for government transparency. While many parliamentary voting records are available online, they are often difficult to interpret, making it challenging to understand legislative behavior across parliaments and predict voting outcomes. Accurate prediction of votes has several potential benefits, from simplifying parliamentary work by filtering out bills with a low chance of passing to refining proposed legislation to increase its likelihood of approval. In this study, we leverage advanced machine learning and data analysis techniques to develop a comprehensive framework for predicting parliamentary voting outcomes across multiple legislatures. We introduce the Voting Prediction Framework (VPF) - a data-driven framework designed to forecast parliamentary voting outcomes at the individual legislator level and for entire bills. VPF consists of three key components: (1) Data Collection - gathering parliamentary voting records from multiple countries using APIs, web crawlers, and structured databases; (2) Parsing and Feature Integration - processing and enriching the data with meaningful features, such as legislator seniority, and content-based characteristics of a given bill; and (3) Prediction Models - using machine learning to forecast how each parliament member will vote and whether a bill is likely to pass. The framework will be open source, enabling anyone to use or modify the framework. To evaluate VPF, we analyzed over 5 million voting records from five countries - Canada, Israel, Tunisia, the United Kingdom and the USA. Our results show that VPF achieves up to 85% precision in predicting individual votes and up to 84% accuracy in predicting overall bill outcomes. These findings highlight VPF's potential as a valuable tool for political analysis, policy research, and enhancing public access to legislative decision-making.
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections
von der Heyde, Leah, Haensch, Anna-Carolina, Wenz, Alexander
Large language models (LLMs) are perceived by some as having the potential to revolutionize social science research, considering their training data includes information on human attitudes and behavior. If these attitudes are reflected in LLM output, LLM-generated "synthetic samples" could be used as a viable and efficient alternative to surveys of real humans. However, LLM-synthetic samples might exhibit coverage bias due to training data and fine-tuning processes being unrepresentative of diverse linguistic, social, political, and digital contexts. In this study, we examine to what extent LLM-based predictions of public opinion exhibit context-dependent biases by predicting voting behavior in the 2024 European Parliament elections using a state-of-the-art LLM. We prompt GPT-4-Turbo with anonymized individual-level background information, varying prompt content and language, ask the LLM to predict each person's voting behavior, and compare the weighted aggregates to the real election results. Our findings emphasize the limited applicability of LLM-synthetic samples to public opinion prediction. We show that (1) the LLM-based prediction of future voting behavior largely fails, (2) prediction accuracy is unequally distributed across national and linguistic contexts, and (3) improving LLM predictions requires detailed attitudinal information about individuals for prompting. In investigating the contextual differences of LLM-based predictions of public opinion, our research contributes to the understanding and mitigation of biases and inequalities in the development of LLMs and their applications in computational social science.
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion
von der Heyde, Leah, Haensch, Anna-Carolina, Wenz, Alexander
The recent development of large language models (LLMs) has spurred discussions about whether LLM-generated "synthetic samples" could complement or replace traditional surveys, considering their training data potentially reflects attitudes and behaviors prevalent in the population. A number of mostly US-based studies have prompted LLMs to mimic survey respondents, with some of them finding that the responses closely match the survey data. However, several contextual factors related to the relationship between the respective target population and LLM training data might affect the generalizability of such findings. In this study, we investigate the extent to which LLMs can estimate public opinion in Germany, using the example of vote choice. We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates on the aggregate and subgroup levels. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties. While the LLM captures the tendencies of "typical" voter subgroups, such as partisans, it misses the multifaceted factors swaying individual voter choices. By examining the LLM-based prediction of voting behavior in a new context, our study contributes to the growing body of research about the conditions under which LLMs can be leveraged for studying public opinion. The findings point to disparities in opinion representation in LLMs and underscore the limitations in applying them for public opinion estimation.
How They Vote: Issue-Adjusted Models of Legislative Behavior
We develop a probabilistic model of legislative data that uses the text of the bills to uncover lawmakers' positions on specific political issues. Our model can be used to explore how a lawmaker's voting patterns deviate from what is expected and how that deviation depends on what is being voted on. We derive approximate posterior inference algorithms based on variational methods. Across 12 years of legislative data, we demonstrate both improvement in heldout predictive performance and the model's utility in interpreting an inherently multi-dimensional space.
Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias
Lee, S., Peng, T. Q., Goldberg, M. H., Rosenthal, S. A., Kotcher, J. E., Maibach, E. W., Leiserowitz, A.
Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. The LLMs were conditioned on demographics and/or psychological covariates to simulate survey responses. The findings indicate that LLMs can effectively capture presidential voting behaviors but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. GPT-4 exhibits improved performance when conditioned on both demographics and covariates. However, disparities emerge in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans. While highlighting the potential of LLMs to aid social science research, these results underscore the importance of meticulous conditioning, model selection, survey question format, and bias assessment when employing LLMs for survey simulation. Further investigation into prompt engineering and algorithm auditing is essential to harness the power of LLMs while addressing their inherent limitations. Keywords: Global warming; large language models; algorithmic fidelity; public opinion 1. Introduction It is very important to measure public opinion about global warming, as these opinions can have considerable influence over policy-making decisions (Bromley-Trujillo & Poe, 2020) and shape public behavior (Doherty & Webler, 2016). A primary method employed by scholars and policymakers for measuring and assessing these opinions is through representative surveys (Berinsky, 2017). However, the extensive time and financial resources required for these surveys can hinder the timely tracking of evolving public opinions about global warming. Resource constraints can also lead to an unintended bias towards majority opinions, potentially neglecting the perspectives of minority groups due to their typically smaller sample sizes in national representative surveys. Nonetheless, understanding diverse public opinion regarding global warming is also vital for climate justice. This understanding can promote equitable decisionmaking, elevate the concerns of vulnerable communities, help align climate policies with democratic principles, build public support, and address disparities in climate change awareness and priorities. Furthermore, understanding the diversity of public opinion can help support a just transition and mobilize support for climate justice initiatives.
Tech Companies Are Taking Action on AI Election Misinformation. Will it Matter?
The announcement comes a day after Microsoft announced it was also taking a number of steps to protect elections, including offering tools to watermark AI-generated content and deploying a "Campaign Success Team" to advise political campaigns on AI, cybersecurity, and other related issues. Next year will be the most significant year for elections so far this century, with the U.S., India, the U.K., Mexico, Indonesia, and Taiwan all headed to the polls. Although many are concerned about the impact deepfakes and misinformation could have on elections, many experts stress the evidence for their impacts on elections so far is limited at best. Experts welcome the measures taken by tech companies to defend election integrity but say more fundamental changes to political systems will be required to tackle misinformation. Tech companies have come under scrutiny after the role they played in previous elections.
Navigating The Ethical Perils And Promises Of Artificial Intelligence
The Biden administration announced Wednesday a new website dedicated to artificial intelligence, a place where people can stay up-to-date on the federal government's developments in artificial intelligence. This news comes as Dr. Peter Hershock from the East-West Center prepares for a virtual discussion on the subject Thursday. Dr. Peter Hershock is the director of the center's Asian Studies Development Program and recently released a book about artificial intelligence, "Buddhism and Intelligent Technology: Toward a More Humane Future." Hershock says our development of artificial intelligence will eventually lead to the ethical singularity, "a point at which evaluating competing value systems and conceptions of humane intelligence take on infinite value/significance." In his book, he looks back at historic schools of thought such as Confucianism, Buddhism and Socrates to find insight, to help navigate the conflicting values of artificial intelligence.
Modeling Voters in Multi-Winner Approval Voting
Scheuerman, Jaelle, Harman, Jason, Mattei, Nicholas, Venable, K. Brent
In many real world situations, collective decisions are made using voting and, in scenarios such as committee or board elections, employing voting rules that return multiple winners. In multi-winner approval voting (AV), an agent submits a ballot consisting of approvals for as many candidates as they wish, and winners are chosen by tallying up the votes and choosing the top-$k$ candidates receiving the most approvals. In many scenarios, an agent may manipulate the ballot they submit in order to achieve a better outcome by voting in a way that does not reflect their true preferences. In complex and uncertain situations, agents may use heuristics instead of incurring the additional effort required to compute the manipulation which most favors them. In this paper, we examine voting behavior in single-winner and multi-winner approval voting scenarios with varying degrees of uncertainty using behavioral data obtained from Mechanical Turk. We find that people generally manipulate their vote to obtain a better outcome, but often do not identify the optimal manipulation. There are a number of predictive models of agent behavior in the COMSOC and psychology literature that are based on cognitively plausible heuristic strategies. We show that the existing approaches do not adequately model real-world data. We propose a novel model that takes into account the size of the winning set and human cognitive constraints, and demonstrate that this model is more effective at capturing real-world behaviors in multi-winner approval voting scenarios.
Understanding Voting Outcomes through Data Science
After the surprising results of the 2016 presidential election, I wanted to better understand the socio-economic and cultural factors that played a role in voting behavior. With the election results in the books, I thought it would be fun to reverse-engineer a predictive model of voting behavior based on some of the widely available county-level data sets. For example, if you want to answer the question "how could the election have been different if the percentage of people with at least a bachelor's degree had been 2% higher nationwide?" you can simply toggle that parameter up to 1.02 and click "Submit" to find out. The predictions are driven by a random forest classification model that has been tuned and trained on 71 distinct county-level attributes. Using real data, the model has a predictive accuracy of 94.6% and an ROC AUC score of 96%.